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Abstract 

The main aim of this study is the comparative examination of the factor structures, corrected item-total corre¬ 
lations, and Cronbach-alpha internal consistency coefficients obtained by different methods used in imputati¬ 
on for missing values in conditions of not having missing values, and having missing values of different rates in 
terms of testing the construct validity of a scale. The research group of the study, which is of a basic research, 
consists of 200 teacher candidates who attended the Department of Elementary Education at Ankara University, 
Faculty of Educational Sciences during the 2008-2009 Academic Year's spring term. The data were gathered by 
the Fatalism Scale (Sekercioglu, 2008), and exploratory factor analysis based on principal component analysis 
method was used. The findings showed that the "single factor” structure of the scale, whose construct validity 
was examined in the context of the study, was also found as "single factor” when it was obtained by the original 
data set having no missing values in situations of imputation for missing values with different methods where¬ 
as it also caused decrease in explained variance for imputation for missing values. A similar decrease was also 
seen in eigenvalues and Cronbach-alpha internal consistency coefficients. 
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There are incomplete data in many study contexts. 
These empty or unanswered values in data sets are 
named missing values (data), and are of a problem 
most researchers face. Even though researchers try 
to get complete data sets, it would not be wrong 
to imply that this problem is frequently faced in 
situations that participants’ data are gathered by 
scales based on the self-report technique. 

Missing data may occur from various reasons. For 
instance, accidentally, participants might leave 
some questions unanswered in long questionnaires; 
mechanical failures may cause unrecorded data 
in experimental processes or procedures or 
the research may be about a sensitive issue (for 
instance, sexual behavior), and the participants 
may use their right not to answer these sorts of 
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questions (Field, 2005). Garson (2008) groups these 
reasons as fatigue, sensitivity, lack of knowledge, 
and other reasons, adding that there can also be 
missing data caused by missing records in some 
information obtained from archives. In addition 
to these reasons, as stated by Van der Ark and 
Vermunt (2007), there might be respondents who 
cannot get to some questions in speed tests because 
of the lack of time. Additionally, the respondents 
may leave some questions unmarked for they may 
not know the answer or avoid predicting in the 
performance tests (Finch & Margraf, 2008). To 
sum up, scales aiming to determine the cognitive, 
affective, and behavioral qualities might include 
missing values based on the reasons mentioned 
above which may affect the validity and reliability 
of the scores obtained from such scales. 

The seriousness of the missing value problem 
varies depending on the fact that it has a pattern 
or not, to what extent the data have missing values, 
and why they appear as missing. It is a more serious 
problem for the missing values to have a pattern 
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than the amount of the missing values (SPSS, 
2007; Tabachnick & Fidell, 1996). If there are few 
missing values performing a random pattern in 
large data sets, the problem is not so serious and 
using different methods in removing missing 
values will cause similar results. However, having 
many missing values in small and medium size 
data sets causes serious problems. Unfortunately, 
there is not a criterion in the situations when 
deciding how many missing values will be tolerated 
for which sample sizes (Tabachnick &Fidell, 
1996). Researchers may use alternative methods 
in handling missing values. It is possible to study 
these methods under three basic groups: 

Defining one or More Value(s) instead of the 
Missing Value, and Excluding These Data from 
Analysis: It can be defined in computer programs 
that a datum is a missing value for a participant. 
Therefore, computer programs ignore this value 
defined; in other words, they do not include them 
into the analyses (Field, 2005). 

Deleting Subjects and Variables Including 
Missing Value: Another way to interfere into 
missing values is to delete the subjects or variables 
causing problems for they have missing values. 
Each subject including a missing value is excluded 
from the data file. If only a few subjects have 
missing values, then deleting is a good alternative 
(Mertler & Vannatta, 2005). However, Carpita 
and Manisera (2008) emphasize that deleting the 
subjects causing problems for including missing 
values may result in data loss, and depending on 
the amount of missing values, it may also cause 
important bias because of the likely systematic 
differences between those who answer and those 
who do not. 

Another option is that the missing values have 
grouped in few variables. In this situation, if 
the variable(s) is/are not important and basic 
variables in terms of research problem, it can be 
considered to delete (exclude from the data set) 
variables. However, if the variables are distributed 
throughout the data set and there are numerous, 
deleting subjects and/or variables cause serious 
data loss (Mertler & Vannatta, 2005). Tabachnick 
and Fidell (1996) state that this situation may 
cause serious problems particularly for the groups 
in experimental patterns, because excluding 
even one subject from the data set will require 
corrections related to the unequal n numbers. 
Moreover, if the subjects who have missing values 
do not distribute randomly, deleting these data 
may result in skewness of the distribution as well. 


For these reasons, Fox-Waslylyshyn and El-Masri 
(2005) point out that, unlike the deleting process, 
imputation for missing values is a process that 
helps sample size protected. 

Predictions of Missing Values/Imputation: 

Another way to interfere into missing values is to 
make predictions of missing values and use these 
values in basic analyses. However, predictions 
and imputation processes can only be applied 
for quantitative variables. Three most common 
methods to make these predictions (imputation) 
are “prior knowledge”, “average (mean) value 
imputation”, and “regression” (Mertler & Vannatta, 
2005; Tabachnick & Fidell, 1996). 

Using prior knowledge is researchers’ imputation 
of new values into missing values based on 
previous knowledge (Mertler & Vannatta, 2005). 
Another alternative related to missing value 
estimation is to calculate the mean using data 
obtained, and imputing these means for variables 
that have missing values. This process is applied 
before the basic analyses. If the researcher does not 
have other information, average value imputation 
is the best way of estimation. The third alternative 
to deal with or estimate the missing values is to 
use the regression approach. In regression, one or 
more independent variables are taken into process 
in order to develop an equation that can be used in 
imputing the dependent variables value. A variable 
that has missing values in missing value estimation 
process becomes the dependent variable. Subjects 
who have complete data are used to develop 
this estimation equation. Once the equation is 
obtained, it is used to estimate the missing values in 
dependent variable for subjects who have missing 
data (Tabachnick & Fidell, 1996). 

Garson (2008) states that there is not a simple 
rule in unintervention of missing values, deleting 
individuals who have missing values or imputation 
expressing deleting individuals that have missing 
values will not be a problem in case there are 
missing values less than 5% in large samples. 
Howell (2009) emphasizes that there are both 
advantages and disadvantages of each of method 
related to missing values, and that these should be 
taken into consideration. 

According to Huisman (2000), there are many 
different ways to discuss missing values, and 
imputation is one of the most popular strategies 
in dealing with missing values in the items in a 
scale. In imputation process, empty data in the 
data set is filled with estimated values. However, 
imputation for missing values is sometimes 
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dangerous. Huisman (2000) states that Dempster 
and Rubin (1983) express this situation as follows: 
At first, imputation seems to be a seductive idea for 
researchers, yet it is also a dangerous. It is seductive 
because it relieves the researcher for the belief that 
the data should be complete is a pressure on the 
researchers. On the other hand, it is dangerous 
since analysis results might be biased in case there 
are systematic differences between those who 
answer and those who do not, therefore there 
can easily be wrong results. Despite its danger, 
“imputation” is a popular technique because it 
gives the researchers the opportunity to work with 
a complete data set. However, imputation processes 
which are inexpertly conducted may cause much 
worse results than doing nothing. For this reason, 
one should be aware (Huisman, 2000). 

There are five different options of imputation in 
SPSS, and the imputation methods handled in 
this study are limited to these five options. These 
methods can be briefly summarized as (Mertler & 
Vannatta, 2005): 

1. Series Mean: It is the mean of all subjects related 
to a certain variable, and it is the default value in 
the program. 

2. Mean of Nearby Points: It is the mean of 
nearby (surrounding) values. The number of 
nearby values can be found by using “span 
of nearby points” option. The default value 
in the program appears as “2 digits”. In other 
words, arithmetical mean is calculated by using 
complete observation values under and above 
the missing data, and this value is imputed 
instead of the missing data. 

3. Median of Nearby Points: It is the median of the 
nearby (surrounding) values. The researcher can 
also determine the number of the surrounding 
values. In other words, median is calculated by 
using complete observation values under and 
above the missing data, and this value is imputed 
instead of the missing data. 

4. Linear Interpolation: This value is the imputation 
of the last complete observation value before the 
missing data and the first complete observation 
value after the missing value instead of the 
missing data. If the first and last observations 
are missing in the set, there cannot be any values 
imputed instead of the missing value. 

5. Linear Trend of Point: The value is consistently 
determined in accordance with the trend the 
current structure (for instance, if the values 
tend to increase from the first subject to the 


last) performs. Missing data are placed into the 
values decided in an index variable where the 
sets are scaled from 1 to n. 

When studies in other countries on missing value 
issue are examined, many of them are available. 
For example, Raymond and Roberts (1987) 
compared handling methods for missing values 
(missing data sets) in some selected studies. 
Fichman and Cummings (2003) studied multiple 
imputation in multivariate analysis, Grung and 
Manne (1998); Sanguinetti and Lawrence (2006); 
Raiko, Ilin and Karhunen (2007) studied missing 
value issue in principal component analysis, 
Carpita and Manisera (2008) studied missing value 
imputation in research with Likert type scales, and 
Robitzsch and Rupp (2009) studied the effect of 
missing values on determining differential item 
functioning (comparison of Mantel-Haenszel and 
logistic regression techniques). However, in those 
studies, the examination of missing values or 
missing value procedure was different from those 
in the present study. As described below, missing 
values were generally considered and examined 
as “Missing Completely at Random-MCAR”, 
“Missing at Random-MAR” and “Missing not at 
Random-MNAR” mechanisms. For instance, in a 
study by Shrive, Stuart, Quan and Ghali (2006), 
1580 participants were given the Zung Depression 
Scale, which was answered from 1 to 4 and 
those with scores higher than 40 were described 
as individuals with depressive syndromes. For 
missing values, “Missing Completely at Random- 
MCAR”, “Missing at Random-MAR” and “Missing 
not at Random-MNAR” mechanisms were 
examined and six different imputation methods 
were studied. These methods were multiple 
imputation, single regression, individual mean, 
overall mean, participants preceding response 
and random imputation from 1 to 4. As a result, 
multiple imputation method was the best to use. 
Also, it was concluded that individual mean 
imputation was an eligible method and easy to 
interpret. 

When studies in Turkey on missing value issue 
are examined, it is clear that there is no direct 
research on missing value issue although missing 
values have been mentioned in some data mining 
research (Kizilkaya-Aydogan, Gencer, & Akbulut, 
2008). Only in a study by Oguzlar (2001) where 
7452 observation values and 21 constant variables 
from a 54-variable-data base about 207 countries 
on the World Bank webpage were included, 
listwise data deletion, pairwise data deletion, EM, 
regression imputation techniques and missing 
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value mechanisms in SPSS were examined. 
Missing value mechanisms were discussed as 
“MCAR”, “MAR” and “Nonignorable-NI” and what 
mechanisms were to include the available missing 
values were defined. 

In the light of the above mentioned discussions, 
examining missing value issue constitutes the 
problem of this study in terms of testing construct 
validity and reliability of a scale. Generally, in social 
science research and especially educational and 
psychological studies, frequently developed scales, 
data collection by defining technical qualities of 
scales such as validity and reliability or using the 
available scales and presentation of results show 
that it is essential to examine to what extent missing 
values affect the procedure. The fact that there is no 
research in Turkey on missing value issue in related 
fields within the framework of technical qualities 
highlights the need for proper examination. 

Aim 

General aim of this study is the comparative 
examination of the corrected item-total 
correlations, Cronbach-alpha internal consistency 
coefficients and the factor structures obtained by 
the different methods (Series Mean Imputation, 
Mean of Nearby Points Imputation, Median of 
Nearby Points Imputation, Linear Interpolation, 
Linear Trend of Point) used in imputation for 
missing values in the condition where there are 
not any missing values and in the conditions where 
there are missing values of different rates (ranged 
approximately 15.00%-20.00% and 0.00%-50.00%) 
in terms of testing the construct validity of a scale. 

Method 

Research Model and Group 

The research is about the comparison of exploratory 
factor analysis results obtained by the principal 
components analysis method used in determining 
factor structures of scales under conditions of 
imputation for missing values by different methods. 
For this reason, the study is a basic research defining 
theoretical studies on information production. The 
research group consists of200 teacher candidates who 
attended the Department of Elementary Education 
at Ankara University Faculty of Educational Sciences 
during the 2008-2009 Academic Years spring term. 


Instrument 

Data of this study were gathered by the Fatalism 
Scale. The Fatalism Scale, whose validity and 
reliability studies were conducted on a group of 
teacher candidates by $ekercioglu (2008), consists 
of 10 items grouped under a single factor. Besides, 
it was found that this single-factor structure 
obtained by the exploratory factor analysis was 
also confirmed by the confirmatory factor analysis. 
The scale, which has a 5-point Likert-type format, 
is scored as “completely inappropriate (1)” to 
“completely appropriate (5).” Therefore, higher 
scores define higher fatalistic thinking level. The 
Cronbach-alpha internal consistency coefficient 
of the Fatalism Scale was found .81. The test-retest 
reliability obtained by two applications conducted 
on a group of 40 people within four weeks was 
found as r =.88 (p<.01). 

Analysis 

Exploratory factor analysis based on principal 
components analysis method was applied in 
order to test the construct validity of a scale under 
different conditions related to missing values in 
this study. Moreover, item-total correlations and 
Cronbach-alpha internal consistency coefficients 
for different conditions were also estimated. 

Procedure 

First, exploratory factor analysis application based 
on principal components analysis method was 
applied with the original data set (n=200) that 
did not have missing values in order to achieve 
the aim of the research. Afterwards, by random 
deleting of some data from the data set, a data set 
that had missing values was obtained. The first data 
set includes missing values varying approximately 
between ranges of 15.00% and 20.00% related to 
the variables (items) whereas the second data set 
includes missing values varying approximately 
between ranges of 0.00% and 50% related to the 
variables (items). Before the imputation processes 
were realized assuming that the missing values were 
randomly distributed throughout the variables in 
both data sets. 

Findings 

Findings of the First Data Set 

The first data set includes missing values varying 
between ranges of 15.00% and 20.00%. When the 
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findings of factor analysis realized by this data set 
were generally evaluated, the item with the lowest 
factor loading was the 6 th item in condition where 
there were not missing values and where there were 
imputed for missing values by different methods. 
It should be remembered that the 6 th item showed 
a factor loading less than .30 under the “Linear 
Interpolation” condition, and it was excluded from 
the scale for this reason. The item with the highest 
factor loading under all conditions was the 5 th item. 
The 9 th and the 5 th items had equal and highest 
factor loadings in the original data set without 
missing values whereas this situation was not seen 
again under any conditions. 

When the eigenvalues and explained variances 
were evaluated, it was seen that the highest values 
were obtained in the condition without the missing 
values. Imputation for missing values caused a 
decrease in variance rates. The condition where 
the lowest variance was explained was the “Linear 
Interpolation” condition whereas the condition 
where the highest variance was explained was the 
“Linear Trend of Point” condition. However, when 
the 6 th item in the “Linear Interpolation” condition 
was excluded, it was seen that the lowest explained 
variance rate appeared under the Median of Nearby 
Points Imputation condition. 

When findings of corrected item-total correlations 
and Cronbach-alpha internal consistency 
coefficient estimated by the first data set were 
evaluated, it was seen that the lowest range of the 
corrected item-total correlations varied between 
.20 and .38, and the highest range of the corrected 
item-total correlations varied between .59 and 
.79. However, when the range was evaluated with 
the item-total correlations obtained for the factor 
structure repeated in order to exclude item 6 from 
the scale under “Linear Interpolation” condition, 
it was seen that the lowest range of the corrected 
item-total correlations varied between .22 and .42. 
Cronbach-alpha internal consistency coefficients’ 
range varied between .78 and .85. Imputation for 
missing values caused a decrease in Cronbach- 
alpha internal consistency coefficients as it did in 
eigenvalues and variance rates. 

Findings of the Second Data Set 

The second data set includes missing values 
varying between ranges of 0.00% and 50.00%. 
When the findings of factor analysis realized by 
this data set were generally evaluated, the item 
with the lowest factor loading was the 6 th item 


under all conditions whereas the item with the 
highest factor loading was the 5 th item. However, 
it should be remembered that the 6 th item showed 
a factor loading less than .30 under the imputed 
“Mean of Nearby Points” and “Median of Nearby 
Points”, and “Linear Interpolation” conditions, and 
it was excluded from the scale. When the analyses 
were repeated for these conditions, the item with 
the lowest factor loading was the 10 th item under all 
conditions whereas the item with the highest factor 
loading was the 5 th item, again. 

When the eigenvalues and explained variance values 
were evaluated, it was seen that the highest values 
were obtained in the condition without the missing 
values. Imputation for missing values caused a 
decrease in variance rates. The condition where 
the lowest variance was explained was the “Linear 
Interpolation” condition whereas the condition 
where the highest variance was explained was the 
“Linear Trend of Point” condition. However, when 
the analyses were repeated under “Mean of Nearby 
Points”, “Median of Nearby Points”, and “Linear 
Interpolation” conditions with the exclusion of 
the 6 th item, it was seen that the lowest explained 
variance rate appeared under the “Series Mean” 
imputation condition. 

When findings of corrected item-total correlations 
and Cronbach-alpha internal consistency 
coefficient were evaluated, it was found that the 
lowest range of the corrected item-total correlations 
varied between .19 and .51 before the 6 th item 
was excluded, and .25 and .51 after the exclusion. 
It was also seen that the highest range of the 
corrected item-total correlations did not become 
different according to the 6 th item’s inclusion 
or exclusion, yet varied between ranges .56 and 
.90. When Cronbach-alpha internal consistency 
coefficients were examined, it was found that they 
varied between .77 and .91, and that there was no 
difference seen in this range after the exclusion of 
the 6 th item. 

Discussion and Results 

In this study, a comparative examination was 
conducted on the factor structures obtained by the 
different methods (Series Mean Imputation, Mean 
of Nearby Points Imputation, Median of Nearby 
Points Imputation, Linear Interpolation, Linear 
Trend of Point) used in imputation for missing 
values in the condition where there were not any 
missing values and in the conditions where there 
were missing values of different rates in terms of 
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testing the construct validity of a scale. Moreover, 
corrected item-total correlations and Cronbach- 
alpha internal consistency coefficients related 
to the factor structures obtained under different 
conditions in both data sets having different rates 
of missing values were found. 

When the findings were evaluated, it was seen 
that the examined scales “single-factor” structure 
obtained by the original data set without missing 
values was also obtained as “single-factor” for the 
conditions of imputation for missing values by 
using different methods. 

The first data set included missing values varying 
between ranges of 15.00% and 20.00%. In the 
analyses realized on this data set, an item under 
the “Linear Interpolation” condition was excluded 
from the scale for it showed low factor loading. 
Therefore, it was found that the 10-item scale 
performed with 9 items under this condition. 
The second data set included missing values 
varying between ranges of 0.00% and 50.00%. 
In the analyses realized on this data set, it was 
determined that an item that did not work under 
the “Linear Interpolation” condition in the first 
data set would be excluded from the scale since it 
showed low factor loading under “Mean of Nearby 
Points”, “Median of Nearby Points”, and “Linear 
Interpolation” conditions. There was an important 
point in analyses with both data sets. Although the 
study did not attempt to develop a scale, the reason 
for item exclusion from the scale was to emphasize 
the changes in factor structure of the scale under 
different conditions. It was thought that factor 
analysis needed to be repeated following the item 
exclusion from the scale to observe such changes in 
accordance with the aim of the study. 

When the findings of exploratory factor analyses 
based on principal components analysis method 
realized by the two data sets having different 
missing value rates are generally evaluated, it 
can be stated that the items having lowest and 
highest factor loadings show consistency in almost 
all conditions. A similar situation is at hand 
in corrected item-total correlations. The most 
important situation observed in the construct 
validity examinations conducted in this study is 
that imputation for missing values cause decrease 
in explained variance rates. It can be pointed out 
that this situation, which has been emphasized as 
being related to the mean imputation by Mertler 
and Vannatta (2005) in literature, was observed in 
all imputation conditions. Besides, same situation 
is observed in eigenvalues and Cronbach-alpha 


internal consistency coefficients. In other words, 
imputation causes a decrease in aforementioned 
values. In case of a low number of missing values 
in data, data deletion might not affect the sample 
power to represent the population. However, 
when there is a high percentage of missing values 
in the data set, disregarding such data may reduce 
the reliability of model structure and model 
estimations (Satici and Kadilar, 2009). In the 
literature, there are methods such as missing value 
imputation, series mean imputation, imputation 
for data production by another variable, nearby 
points imputation and weighted methods (Satici, 
2009). In this study, missing value imputation 
was performed by using series mean imputation, 
imputation for data production by another 
variable (median, mode), imputation by data 
production, and imputation by nearby data 
production methods. It was observed that factor 
structure of the scale was degenerated in such 
imputations and there were decreases in both 
explained variance and reliability criteria. In the 
literature, it was reported that the methods led 
to systematic errors (Satici, 2009). For the study, 
it might be thought that systematic errors had a 
direct effect on construct validity of the scale 
and an indirect effect on reliability. Donders, 
Heijden, Stijnen and Moons (2006) suggested that 
data production by neighborhood might lead to 
biased or deviant findings. Missing observations 
similar to nearby observations occasionally 
lead to consistent outcome production and they 
sometimes cause data production inconsistent 
with both within-case complete observations and 
complete data set cases. 

The study examined effectiveness of commonly 
used imputation methods. In the literature, there 
have been arguments that there are more effective 
methods to produce more realistic results. Over the 
recent years, there have been studies which claim 
that data production by the examined “Hot Deck 
Imputation”, “EM (Expectation Maximization)” 
and “Regression Method” is more effective than 
data production by classical methods” (Kayaalp 
& Polat, 2001; Ozel & Ata, 2007; Satici & Kadilar, 
2009). When higher prospect of biased data 
production by classical method is considered, 
further research where other imputation 
methods are used as suggested in the literature is 
recommended and a comparative study of these 
methods will contribute to the use and extension 
of the methods. 
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